06 - Project 1

🧬 Project: Neural Classification of Erythrocyte Anomalies

1. Project Overview

In low-resource hematology settings, manual screening of blood smears for intracellular parasites is time-consuming and error-prone. This project aims to automate the triage process by developing a Deep Learning model capable of distinguishing between healthy erythrocytes (red blood cells) and those containing a specific intracellular pathogen.

Your task is to design, train, and validate a Convolutional Neural Network (CNN) to perform binary classification on single-cell images.

2. The Dataset

Dataset download link: Dataset

You are provided with a proprietary dataset consisting of segmented “patches” (Regions of Interest) extracted from thin blood smear slides stained with Giemsa. Each image contains a single cell.

The data has been anonymized and split into two distinct sets:

train: A folder containing approximately 22,000 labeled images.
- Sub-folder negative (class 0): Represents Healthy/Control samples.
- Sub-folder positive (class 1): Represents Infected/Anomalous samples.
test: A folder containing approximately 5,500 unlabeled images.
- You do not have the ground truth labels for this set.
- You will use this set to generate your final predictions for grading.

⚠️ Data Note: The images possess varying resolutions and aspect ratios. A crucial part of your pipeline will be establishing a robust pre-processing strategy to normalize these inputs before feeding them into your network.

3. Technical Objectives

A. Data Pre-processing & Augmentation

Since the input dimensions vary, you must implement a pipeline to:

Resize/Rescale images to a fixed input size (e.g., \(64\times64\), \(128\times128\), or \(224\times224\)) suitable for your architecture.
Normalize pixel intensity values.
Implement Data Augmentation on the training set to prevent overfitting. Consider rotations, flips, and brightness adjustments to simulate varying lighting conditions in microscopy.

B. Neural Network Architecture

You are required to construct a Convolutional Neural Network. You may choose one of two paths:

Custom Architecture: Design your own stack of Convolutional, Max-Pooling, and Dense layers. You must justify your choice of kernel sizes and depth.
Transfer Learning: Utilize a pre-trained backbone (e.g., VGG-16, ResNet-18, MobileNet) with a custom classification head. If you choose this, you must explain your freezing/unfreezing strategy.

C. Training Loop

Loss Function: Select a loss function appropriate for binary classification..
Optimizer: Use an adaptive optimizer or SGD with momentum.
Validation: You must split the provided train further to create your own internal validation set (e.g., an 80/20 split) to monitor loss curves and stop training before overfitting occurs.

4. Deliverables

Part 1: Short report

Your short report should be a PDF document containing the following information:

Neural network architecture, loss function, optimizer, and hyperparameters used.
Captum or similar library visualizations (e.g., Grad-CAM) to interpret model decisions on sample images.

You must run your final, trained model on the images in the test folder.

Generate a CSV file named submission.csv.
The file must contain header and have two columns: filename and prediction (0 for negative or 1 for positive).
Ensure the filenames match exactly.